Recalling Assignment 3- Task Abstraction

Code

knitr::opts_chunk$set(echo=FALSE, warning=FALSE, error=FALSE, message=FALSE)

Import Data

Pig	IL1B_Jej_pg_mL	IL1B_Ile_pg_mL	IL1B_Col_pg_mL	TNFA_Jej_pg_mL	TNFA_Ile_pg_mL	TNFA_Col_pg_mL	IL8_Jej_pg_mL	IL8_Ile_pg_mL	IL8_Col_pg_mL
P1	158.107	258.529	18.902	10.084	191.685	67.410	2717.095	3244.064	48.389
C1	304.570	300.510	173.388	5.293	148.321	53.500	1807.191	2747.520	521.019
HM1	126.749	92.684	80.020	0.000	163.895	0.000	2163.072	1842.670	93.936
HM2	90.214	72.580	75.545	0.000	130.676	0.000	3434.170	3433.543	194.084
HM3	114.224	79.751	239.901	0.000	0.000	28.050	1649.797	1678.069	334.678
HM4	132.318	107.964	155.655	3.612	5.265	18.105	3390.124	3088.125	373.427
HM5	68.657	49.695	12.045	0.000	112.351	0.000	3148.633	3153.588	172.843
HM6	112.428	57.860	52.582	0.000	0.000	0.000	3088.206	4084.593	183.157
P2	194.675	145.165	63.666	12.729	18.287	89.610	2474.794	2692.826	880.425
C2	401.575	564.465	305.068	42.442	17.261	84.509	1635.162	1030.576	255.938
IF1	174.037	130.035	38.956	0.000	105.243	3.978	2013.397	1678.351	49.232
IF2	126.663	115.666	76.857	0.000	63.294	4.975	3399.788	1727.786	116.573
IF3	141.120	53.064	182.891	4.574	0.000	12.948	3766.222	504.454	38.554
IF4	132.275	78.283	18.266	9.628	154.257	0.000	3705.406	2293.403	222.193
IF5	111.800	91.316	82.741	0.000	223.890	4.293	3228.713	2499.064	103.378
IF6	159.411	120.352	31.099	0.000	5.171	0.000	2001.489	2659.008	173.974

I am adding in my DATA DICTIONARY from Assignment 2:

Data Dictionary:

Flat Table

Just one excel sheet with items and attributes.

Items (rows) = R studio calls these observations.

In Cytokine_summary, there are 16 pigs. P1 and P2 were pilot pigs fed piglet milk replacer formula. C1 and C2 were farm control pigs (siblings) raised at a farm (the same one as the other pigs), feeding from their own mom, and then we received them for necropsy on day of life (DOL) 28. HM1-6 were fed human milk for 28 days in our lab. IF1-6 were fed infant formula for 28 days in our lab. Pairs of HM and IF (such as HM1 and IF1) were siblings and both raised at the same time, but in different cages.

Attributes (columns) = R studio calls these variables.

In Cytokine_summary, there are 10 variables. One of these columns = “Pigs” and specifies the observations described above. All other variables are cytokine values from ELISAs on various intestinal tissues harvested fromt the pigs at necropsy on DOL 28 (except HM/IF5- they had to stay with us a little longer). Detected ELISAs tested included IL1B, TNFA, and IL8. Each of these cytokines were tested on the jejunum (Jej), ileum (Ile), and Colon (Col) of each pig. Concentration units of for each measurement were in pg/mL.

Load Libraries

Now that I have the necessary data and packages, I want to make a box blot distribution of my various cytokines per pig feeding group - HM, IF, P, and C. Barrie helped me with this!

Organize the Data

Here we add another column into Cytokine_summary via R

What does each diet look like?

Figure 1. Box and whisker plots displaying piglet diet group differences in expression levels (pg/mL) of the pro-inflammatory cytokine IL1B in the jejunum. Jitter overlay is representative of each individual pig.

Legend:

C = Farm Control

HM = HM

IF = IF

P = Lab Control

Now I could create individual ones of these for each column… but we are going to try to work smarter, not harder, and create a new data frame to get all these types of plots into 1 figure.

Creating 9 plots in 1 figure

First we break up the columns 2-10 to their cytokine and their region.

Cytokine_long looks good!

Now to put that into a boxplot with jitter overlay. We facet_wrapped in order to make subplots from 1 plot (slice it up for the viewers).

Figure 2. Box and whisker plots displaying piglet diet group differences in expression levels (pg/mL) of the pro-inflammatory cytokines IL1B, IL8, and TNFA in the jejunum, ileum, and colon. Jitter overlay is representative of each individual pig.

Great!

Onto Assignment 4- Manipulating Marks and Channels

Moving forward from Figure 2., we wanted to see if there were any individual pigs driving differences.

Plot looking at individual pigs and cytokine z scores

Make new data frame for Cytokine z-scores

Make a boxplot now:

Figure 3. Box and whisker plots of individual pigs and their overall cytokine expression z scores. Colors represent respective cytokines.

Super cool! Each pig has 9 dots = 9 cytokines readings (3 cytokines and 3 regions). Doesn’t appear to be any real outlier pigs as a whole (i.e. none are extremely inflamed or non-inflamed for any measure). This shows us there doesn’t appear to be hidden structure in my data.

ACTION = SEARCH

TARGET = ALL DATA

Now for manipulating…

What if I add the CHANNEL of another dimension of FILL/COLOR to Figure 3?

Figure 4. Box and whisker plots of individual pigs and their overall cytokine expression z scores. Colors represent respective cytokines and respective regions.

Actually, this might be quite helpful! Hmm… different colors?

Figure 5. Box and whisker plots of individual pigs and their overall cytokine expression z scores. Colors represent respective cytokines and respective regions.

Helpful or a hindrance, I don’t know!

How about changing the MARK of SHAPE?

Figure 6. Box and whisker plots of individual pigs and their overall cytokine expression z scores. Colors represent respective cytokines and respective regions.

Alright… I am having too much fun here. Lastly, for my micro-pig-ome data, Barrie and I worked on relative abundance stacked barcharts. I won’t go into it too much, but the data I am importing below are microbiome samples (from piglet fecal samples) representative of various timepoints. I want to manipulate the CHANNEL of COLOR to be a little more discriminating.

The following was copied from my pre_decontam project:

Figure 7. Piglet microbiome relative abundances with poor color

I am going to manipulate color now!

Figure 7. Piglet microbiome relative abundances with better color!

In summary

Expressiveness and Effectiveness

Show by Figures 3, 4, 5, and 6.

Figure 3 and 5 = best, 4 and 6 = meh!

Discriminability

Figure 4 fine-tuned Figure 3 by GI region.

However, Figure 5 tried to add in more colors, and ultimately, this just added to the cognitive load. Not sure how to use Region helpfully without overstimulating/confusing.

Separability

Head to Figure 7. Gut microbiome stacked barcharts are a little easier to distinguish with the updated color scheme.

Popout

We were searching for popout in Figure 3. I do notice C2 is a bit of a bigger box compared to the rest.

Thanks! TTFN!

--- title: "BCB 520 Assignment 4" subtitle: "Marks and Channels" author: "Heidi Sellmann" date: "2024-02-08" categories: [Assignments, Data Viz] image: "cytoswine.jpg" code-fold: true code-tools: true description: "My Cytoswine and micro-pig-ome data" format: html editor: visual --- # Recalling Assignment 3- Task Abstraction ```{r} knitr::opts_chunk$set(echo=FALSE, warning=FALSE, error=FALSE, message=FALSE) ``` ## Import Data ```{r Import Data} library(readxl) Cytokine_summary <- read_excel("Cytokine_summary.xlsx") View(Cytokine_summary) knitr::kable(Cytokine_summary) ``` I am adding in my **DATA DICTIONARY** from Assignment 2: ## Data Dictionary: ### Flat Table Just one excel sheet with **items** and **attributes**. ### Items (rows) = R studio calls these observations. In Cytokine_summary, there are 16 pigs. P1 and P2 were pilot pigs fed piglet milk replacer formula. C1 and C2 were farm control pigs (siblings) raised at a farm (the same one as the other pigs), feeding from their own mom, and then we received them for necropsy on day of life (DOL) 28. HM1-6 were fed human milk for 28 days in our lab. IF1-6 were fed infant formula for 28 days in our lab. Pairs of HM and IF (such as HM1 and IF1) were siblings and both raised at the same time, but in different cages. ### Attributes (columns) = R studio calls these variables. In Cytokine_summary, there are 10 variables. One of these columns = "Pigs" and specifies the observations described above. All other variables are cytokine values from ELISAs on various intestinal tissues harvested fromt the pigs at necropsy on DOL 28 (except HM/IF5- they had to stay with us a little longer). Detected ELISAs tested included IL1B, TNFA, and IL8. Each of these cytokines were tested on the jejunum (Jej), ileum (Ile), and Colon (Col) of each pig. Concentration units of for each measurement were in pg/mL. ## Load Libraries ```{r Load libraries, message=FALSE} library(tidyverse) library(ggplot2) library(ggpubr) library(rstatix) library(dplyr) ``` Now that I have the necessary data and packages, I want to make a box blot distribution of my various cytokines per pig feeding group - HM, IF, P, and C. Barrie helped me with this! ## Organize the Data Here we add another column into Cytokine_summary via R ```{r Add in diet column} Cytokine_summary <- Cytokine_summary %>% mutate(Diet = case_when( grepl("^P", Pig) ~ "P", grepl("^C", Pig) ~ "C", grepl("^HM", Pig) ~ "HM", grepl("^IF", Pig) ~ "IF", TRUE ~ NA_character_ # Add this line to handle other cases or set default value )) ``` ## What does each diet look like? ```{r Individual distributions} ggplot(Cytokine_summary, aes(x = Diet, y = IL1B_Jej_pg_mL)) + geom_boxplot() + geom_jitter() ``` **Figure 1.** Box and whisker plots displaying piglet diet group differences in expression levels (pg/mL) of the pro-inflammatory cytokine IL1B in the jejunum. Jitter overlay is representative of each individual pig. ### Legend: C = Farm Control HM = HM IF = IF P = Lab Control Now I could create individual ones of these for each column... but we are going to try to work smarter, not harder, and create a new data frame to get all these types of plots into 1 figure. ## Creating 9 plots in 1 figure First we break up the columns 2-10 to their cytokine and their region. ```{r Making new data frame Cytokine_long, include=FALSE} Cytokine_long <- Cytokine_summary %>% pivot_longer(cols = 2:10, names_to ="Reg_Cyt", values_to = "Expression") %>% separate(Reg_Cyt, into = c("Cytokine", "Region"), sep = "_", remove = FALSE) ``` Cytokine_long looks good! Now to put that into a boxplot with jitter overlay. We facet_wrapped in order to make subplots from 1 plot (slice it up for the viewers). ```{r Cytokine_long into boxplot/jitter plot} ggplot(Cytokine_long, aes(x = Diet, y = Expression)) + geom_boxplot() + geom_jitter() + facet_wrap(Region ~ Cytokine, scales = "free_y") ``` ```{r Save 9 in 1} ggsave("CytokineSummaryBoxplotJitter.pdf") ``` **Figure 2.** Box and whisker plots displaying piglet diet group differences in expression levels (pg/mL) of the pro-inflammatory cytokines IL1B, IL8, and TNFA in the jejunum, ileum, and colon. Jitter overlay is representative of each individual pig. Great! # Onto Assignment 4- Manipulating Marks and Channels Moving forward from Figure 2., we wanted to see if there were any individual pigs driving differences. ## Plot looking at individual pigs and cytokine z scores Make new data frame for Cytokine z-scores ```{r Data frame for Cytokine z-scores} Cytokine_zscore <- Cytokine_summary %>% mutate(across(.cols = 2:10, .fns = ~scale(.) %>% as.vector)) #scaling columns 2-10} ``` Make a boxplot now: ```{r Boxplot with pig by zscore, warning=FALSE} Cytokine_zscore_long <- Cytokine_zscore %>% pivot_longer(cols = 2:10, names_to ="Reg_Cyt", values_to = "Expression_Z_Scores") %>% separate(Reg_Cyt, into = c("Cytokine", "Region"), sep = "_", remove = FALSE) # We just made the data long! First we break up the columns 2-10 to their cytokine and their region. # Don't worry about warning- says I have lots of _} ``` ```{r Boxplot of zscores with long data} ggplot(Cytokine_zscore_long, aes(x = Pig, y = Expression_Z_Scores)) + geom_boxplot() + geom_jitter(aes(color= Cytokine)) ``` ```{r Save Zscore cyotkines simple} ggsave("ZscoreCytokinesSimple.pdf") ``` **Figure 3.** Box and whisker plots of individual pigs and their overall cytokine expression z scores. Colors represent respective cytokines. Super cool! Each pig has 9 dots = 9 cytokines readings (3 cytokines and 3 regions). Doesn't appear to be any real outlier pigs as a whole (i.e. none are extremely inflamed or non-inflamed for any measure). This shows us there doesn't appear to be hidden structure in my data. **ACTION** = SEARCH **TARGET** = ALL DATA ## Now for manipulating... What if I add the **CHANNEL** of another dimension of **FILL/COLOR** to Figure 3? ```{r Boxplot filled} ggplot(Cytokine_zscore_long, aes(x = Pig, y = Expression_Z_Scores, fill = Region)) + geom_boxplot() + geom_jitter(aes(color= Cytokine)) ``` **Figure 4.** Box and whisker plots of individual pigs and their overall cytokine expression z scores. Colors represent respective cytokines and respective regions. Actually, this might be quite helpful! Hmm... different colors? ```{r Boxplot filled better colors} region_colors <- c("black", "gray", "white") ggplot(Cytokine_zscore_long, aes(x = Pig, y = Expression_Z_Scores, fill = Region)) + geom_boxplot() + geom_jitter(aes(color = Cytokine)) + scale_fill_manual(values = region_colors) ``` ```{r Save Zscore cyotkines complex} ggsave("ZscoreCytokinesComplex.png") ``` **Figure 5.** Box and whisker plots of individual pigs and their overall cytokine expression z scores. Colors represent respective cytokines and respective regions. Helpful or a hindrance, I don't know! How about changing the **MARK** of **SHAPE**? ```{r Boxplot with triangle jitter} region_colors <- c("orange", "purple", "yellow") ggplot(Cytokine_zscore_long, aes(x = Pig, y = Expression_Z_Scores, fill = Region)) + geom_boxplot() + geom_jitter(aes(color = Cytokine), shape = 17) + # Change shape to the desired value scale_fill_manual(values = region_colors) ``` **Figure 6.** Box and whisker plots of individual pigs and their overall cytokine expression z scores. Colors represent respective cytokines and respective regions. Alright... I am having too much fun here. Lastly, for my micro-pig-ome data, Barrie and I worked on relative abundance stacked barcharts. I won't go into it too much, but the data I am importing below are microbiome samples (from piglet fecal samples) representative of various timepoints. I want to manipulate the **CHANNEL** of **COLOR** to be a little more discriminating. The following was copied from my pre_decontam project: ```{r Playing around with microbiome, message=FALSE} library(ggplot2) library(magrittr) library(dplyr) # Reading in csv I sent to Barrie test <- read.csv("test.csv") ggplot(data = test, aes(x = Sample_ID, y = Abundance, fill = Genus))+ geom_col(position = "stack")+ theme(axis.text.x = element_text(angle = 90, hjust = 1, size = 4)) # get barchart, default colors, but would love to have x axis ordered. Barrie also modified x axis to be more readable. test <- test %>% mutate(Sample_ID = factor(Sample_ID, levels = unique(Sample_ID[order(typeSample)]))) #order x axis by typeSample ggplot(data = test, aes(x = Sample_ID, y = Abundance, fill = Genus)) + geom_col(position = "stack")+ theme(axis.text.x = element_text(angle = 90, hjust = 1, vjust = 0, size = 4))+ geom_text(aes(x = Sample_ID, y = Inf, label = typeSample, color = typeSample), vjust = 0, angle = 90, hjust = 1, size = 1) #add this line for adding text on top/color/legend, I think ``` **Figure 7.** Piglet microbiome relative abundances with poor color I am going to manipulate color now! ```{r Barrie Microbiome + Janet Colors} library(RColorBrewer) custom_col15 <- c( "#FF0000", "#00B0F0", "#FFFF00", "#96D050", "#CC3399", "#375623", "#FFC000", "#0070C0", "#990033","#00B050", "#FF00FF", "#66FF99", "#F96E05", "#FFFF99", "#000000")#, # "#0000FF", "#FF7C80", "#CC66FF", "#00FF00", "#002060") ggplot(data = test, aes(x = Sample_ID, y = Abundance, fill = Genus)) + geom_col(position = "stack")+ scale_fill_manual(values = custom_col15)+ theme(axis.text.x = element_text(angle = 90, hjust = 1, vjust = 0, size = 4))+ geom_text(aes(x = Sample_ID, y = Inf, label = typeSample, color = typeSample), vjust = 0, angle = 90, hjust = 1, size = 1) #add this line for adding text on top/color/legend, I think ``` **Figure 7.** Piglet microbiome relative abundances with better color! # In summary ## Expressiveness and Effectiveness Show by Figures 3, 4, 5, and 6. Figure 3 and 5 = best, 4 and 6 = meh! ## Discriminability Figure 4 fine-tuned Figure 3 by GI region. However, Figure 5 tried to add in more colors, and ultimately, this just added to the cognitive load. Not sure how to use Region helpfully without overstimulating/confusing. ## Separability Head to Figure 7. Gut microbiome stacked barcharts are a little easier to distinguish with the updated color scheme. ## Popout We were searching for popout in Figure 3. I do notice C2 is a bit of a bigger box compared to the rest. Thanks! TTFN!